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Abstract 

This paper proposes a novel algorithm for signal clas- 
sification problems. We consider a non-stationary random 
signal, where samples can be classified into several differ- 
ent classes, and samples in each class are identically in- 
dependently distributed with an unknown probability distri- 
bution. The problem to be solved is to estimate the proba- 
bility distributions of the classes and the correct member- 
ship of the samples to the classes. We propose a signal 
classification method based on the data compression prin- 
ciple that the accurate estimation in the classification prob- 
lems induces the optimal signal models for data compres- 
sion. The method formulates the classification problem as 
an optimization problem, where a so called "classification 
gain" is maximized. In order to circumvent the difficulties 
in integer optimization, we propose a continuous relaxation 
based algorithm. It is proven in this paper that asymptot- 
ically vanishing optimality loss is incurred by the continu- 
ous relaxation. We show by simulation results that the pro- 
posed algorithm is effective, robust and has low computa- 
tional complexity. The proposed algorithm can be applied 
to solve various multimedia signal segmentation, analysis, 
and pattern recognition problems. 



1 Introduction 

Nature multimedia signals are non-stationary in nature. 
For example, the statistical property of a nature image can 
vary significantly across edges; an audio signal may con- 
tains silent segments and active segments; and the statistics 
of a video signal can be totally different before and after a 
change-of-scene. 

Therefore, it is not a surprise that signal classification 
problems arise naturedly in many scenarios of multimedia 
signal processing. That is, the signal samples need to be 
classified into different classes, where each class contains 
only signals with homogeneous statistics. Such signal clas- 
sification problems have been extensively discussed under 
the name of thresholding or segmentation (see for instance 



[9], [10]). The applications range from multimedia sig- 
nal enhancement to multimedia content analysis and under- 
standing. 

In this paper, we propose an original signal classification 
method based on data compression principles. The center 
idea of our approach is that signal classifications can be con- 
sidered as operations of signal modeling. If a mismatched 
signal model is used in data compression, then a perfor- 
mance loss in terms of coding efficiency is incurred. There- 
fore, an accurate signal classification result should maxi- 
mize the coding efficiency in data compression. 

Based on the above data compression principles, we pro- 
pose an optimization formulation of the signal classification 
problems. In the optimization formulation, the optimiza- 
tion variables are the memberships of samples to different 
classes, and the objective function is the coding efficiency. 
More precisely, we optimize the classification gain, which 
is a measure of coding efficiency. In order to avoid the diffi- 
culties in discrete optimizations, we further propose a con- 
tinuous relaxation and random rounding solution for the op- 
timization problems. It is proven in this paper that the opti- 
maUty loss due to relaxation vanishes when the total sample 
number is large. 

In the data compression literature, the adaptive coding 
approach based on classification has been previously dis- 
cussed. Early works on classifying DCT and wavelet coef- 
ficients into classes and using individual quantizer for each 
class include [2], [14], [12]. The term classification gain 
is coined by Joshi, Jafarkhani, Kasner, Fischer, Farvardin, 
Marcelin, and Bamberger [7]. Two signal classification al- 
gorithms, the maximum classification gain and equal mean- 
normalized standard deviation classification, have been pro- 
posed in [7]. The signal classification approaches for adap- 
tive coding have also been adopted in state-of-the-art sub- 
band coding schemes (see for instance [15]). 

The signal classification problem can also be consid- 
ered as an unsupervised pattern recognition problem. In 
the pattern recognition literature, clustering algorithms for 
such problems have been previously discussed. The well- 
known algorithms include the K-means algorithms, and 
Expectation-Maximization (EM) algorithms [5], [11]. In 



the K-means algorithms, the classification problem is for- 
mulated as an optimization problem, where a sum of dis- 
tances is minimized by an iterative approach. The classifi- 
cation problem can also be formulated as an estimation with 
incomplete data problem, and solved by the EM algorithms 
[4]. In the EM algorithms, the log likelihoods of the esti- 
mated distribution parameters are iteratively lower bounded 
and maximized. 

There are several advantages of the proposed algorithm 
over the previous algorithms. First, the proposed algorithm 
is "provably good", i.e., the algorithm is amendable to rigor- 
ous theoretical analysis. Second, the proposed algorithm is 
more tractable due to that difficult integer optimizations are 
avoided. The proposed approach is also a more general and 
flexible framework. For example, the proposed approach 
is more flexible in choosing optimization solvers. The pro- 
posed algorithm can achieve global optimal solutions if a 
global optimization solver is used; while both K-means al- 
gorithms and EM algorithms converge to local optimal so- 
lutions. 

Our contribution: In summary, we propose a novel prin- 
ciple of signal classification based on data compression. We 
propose the continuous relaxation solution for the optimiza- 
tion formulation. We prove that the optimality loss due to 
continuous relaxation vanishes asymptoticaUy with respect 
to the sample number. 

Organization of the paper: The rest of this paper is orga- 
nized as follows. We discuss the signal model in Section 2. 
A review of classification gain is provided in Section 3. We 
present the proposed classification algorithm in Section 4. 
We present a theoretical discussion on the optimality loss 
due to continuous relaxation in Section 5. Numerical re- 
sults are presented in Section 6. We present the conclusion 
remark in Section 7. 

2 Signal Model 

In this paper, we consider a random signal with N sam- 
ples, xi,X2, ■ ■ ■ ,xn- We assume that the random signal is 
non- stationary and is a mixture of samples from J memory- 
less information sources. That is, there exist J memoryless 
information sources. The corresponding probability distri- 
bution for the i-th information source is Pj. For each n, 
1 < n < A^, the random variable x„ is distributed with 
one of the distributions Pi and independent of all the other 
signal samples. 

The considered signal classification problem is thus to 
estimate the probabihty distribution of each information 
source and the membership of each signal sample. We as- 
sume that the probability distributions of all information 
sources are unknown, i.e., we consider a bUnd signal classi- 
fication scenario. The case, where the probability distribu- 
tion of each information source is known, can be straight- 



forwardly solved by using the first principles of statistical 
detection and estimation theory, and thus will not be dis- 
cussed. In this paper, we also assume that all information 
sources are Gaussian distributed. The generalization of the 
proposed algorithm to non-Gaussian cases (and also an in- 
formation theoretic analysis of the algorithm) will be dis- 
cussed in a companion paper [8]. 

3 Classification Gain 

According to the rate-distortion theory [3], for a memo- 
ryless Gaussian information source with variance cr^, if an 
encoder with rate R is used, then the smaUest achievable 
mean-squared error distortion is, 

D{R) = <t22-2«. (1) 

The function D{R) is the distortion-rate function for Gaus- 
sian information sources. For non-Gaussian information 
sources with the same variance a^, the distortion D{R) is 
achievable by using a source encoder designed for Gaussian 
sources [8]. 

For the non- stationary random signal Xi,X2, ■ ■ ■ ,xn, 
there are two approaches to encode the signal. A naive ap- 
proach adopts an encoder designed for Gaussian sources to 
encode all signal samples. The achievable distortion is 

^^2-2«, (2) 

where, cr^ is the variance for the random signal xi , . . . , xjy. 
A better approach first classifies the signal samples into J 
different classes, and then uses an individual encoder for 
each class of samples. Denote the number of samples in the 
i-th class by Ni. Define pi as the fraction pi = Ni/N. De- 
note the variance of samples in the i-th class by af. Under 
an arbitrary rate allocation, the achievable expected distor- 
tion is, 

E^^i'^iS-^^S (3) 

i=l 

where Ri is the rate allocated to encode the samples in the 
i-th class, 

J 

^p^Ri=R-H{pi,p2,...,pj), (4) 

i=l 

and H{pi.p2 - ■ ■ ■ ,pj) is the entropy function with base 2. 
It can be easily found by using the Lagrangian multiple 
method, that the optimal rate aUocation satisfies the follow- 
ing condition 

Ri = max log J ^) ,q\ . (5) 
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Assume that the rate R is sufficiently high, so that Ri > 
for all i, 1 < i < J. Then, the optimal achievable distortion 
is. 



(6) 



As in the previous research, we define the classification gain 
as the ratio of two achievable distortions. 
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(7) 



4 Classification Algorithm 

In this section, we present the proposed signal classifica- 
tion algorithm. The algorithm is based on the principle that 
the optimal classification induces the optimal signal model 
for data compression (a rigorous treatment of this argument 
can be found in [8]). We formulate the classification prob- 
lem as an integer optimization where the classification gain 
is maximized. 

The integer optimization is as follows. 



(Integer) min {a^f' 2^"^^^' 
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Subject to: 
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Uni = 1, for any n,l < n < N 

i=l 

am e {0, 1} 



(11) 



(12) 



(13) 



In the integer optimization, the optimization variables are 
variables a„j, l<n<A/', l<i<J. Each variable a„i 
is a binary variable indicating the membership of the nth 
sample, i.e., 

1 , if the nth sample is classified to the ith class 
0, otherwise 

(14) 

Alternatively, we can also use a set of integers 
Zi, Z2, ■ ■ ■ , zn to represent the membership of the signal 
samples. The integer z„ = i, if and only if the nth signal 
sample is classified to the ith class. In the sequel, we will 
call such a set of integers 01 , . . . , 2:„ a classification scheme. 



Because integer programming is generally difficult to 
solve, we propose a relaxation and random rounding ap- 
proach. The relaxed programming is as follows. 

(Relaxation) min (jj (cr^)^' | 2'^"^p^'--P'^ (15) 



Subject to: 



Z_-/n— 1 ^ni^n 
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(Ti = i —Jf^ J J2 ~ ^')^ 



Pi 



N 



ttni = 1, for any n,l < n < N 

i=l 

< ani < 1 



(16) 
(17) 

(18) 

(19) 
(20) 



In the relaxed programming, the 0-1 constraints have been 
relaxed to box constraints. 

The proposed algorithm is summarized in Algorithm 1 . 
In the first step, the relaxed optimization is solved. Denote 
the solution of the relaxed optimization by a*^. In the ran- 
dom rounding step, we randomly set Zn according to the 
values of a* j. That is, F{zn = i) = a^j. 

Algorithm 1 The blind signal classification algorithm 

procedure blind classification(.ti, .t2, . . . , xn, J) 
solve the relaxed optimization problem 
for n <— 1, A'^ do 

randomly set Zn = i with probability a* ^ 
end for 

Return classification scheme zi,Z2, ■ ■ ■ , zn 
end procedure 



5 Performance Analysis 

In this section, we present a performance analysis of the 
proposed classification algorithm. We show that the opti- 
mality loss due to relaxation and random rounding is neg- 
ligible if the total sample number A'' is sufficiently large. 
Therefore, our algorithm is near-optimal with reduced com- 
putational complexity. 

We need to use the inequaUty in Lemma 5.1 in our dis- 
cussion. The inequality is one variation of the Azuma in- 
equality proven by Janson [1][6]. 

Lemma 5.1 [6] (Azuma Inequality) Let Zi,..., Zn be in- 
dependent random variables, with Z}. taking values in a set 
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Afc. Assume that a (measurable) function / : Ai x A2 x 
• • • X Ajv — » M satisfies the following Lipschitz condition 
(L). 

• (L) If the vectors z, z' G fl^ Aj differ only in the kth 
coordinate, then \f{z) — f{z')\ < Ck, k = 1, . . . ,N. 

Then, the random variable X = f{Zi, . . . , Zn) satisfies, 
for any t >0, 



F{X >EX + t)< exp 



¥{X <'EX -t)< exp 



-2e 



(21) 



(22) 



As in the previous sections, we use a*^ to denote the 
solution for the relaxation programming. We use p*, (<t*)^, 
/i* to denote the corresponding occurrence probability, vari- 
ance and mean. That is. 



a* ' 



*\2 



Pi 



,2-jn=l'^ni/ n=l 



N 

^ ^ * / * \ 2 



N 



(23) 
(24) 
(25) 



We use zi,. . . ,zn to denote the classification scheme ob- 
tained from Algorithm 1. In the following, we abuse the 
notation and use a„j to denote the randomly rounded ver- 
sion of the variable a*,, i.e.. 



1, if Zn = i 
0, otherwise 



(26) 



Similarly, we use pi, af, to denote the corresponding 
occurrence probability, variance, and mean. That is. 



Mi 



N 

n=l '^"^ 
f 1 



(27) 



N 



'^"^ ~ '"'^^ ' ^^^^ 



Pi 



EJV 

N 



(29) 



Definition: Let ei , £2 , £3 be arbitrary positive real numbers. 

We say that one classification scheme is (ei, £2, £3)-typical 
if the following conditions hold for alH, 1 < « < J, 



N N 



n=l 



n=l 



(30) 



N 



N 



n=l 



AT 



n=l 



< t2N, 



(31) 



X] am (a;„ - n*f - 'Y^ a*^i {x - fj,: 



*x2 



n=l 



n=l 



<e3A^. (32) 



Lemma 5.2 Ifei, £2, £3 all go to zero, then for (fi, £2, £3)- 
typical classification schemes, pi, pi, af goto /z*, p*, {(J*)'^ 
respectively. 

Proof It can be easily checked that goes to /x*, and pi 
goes to p*. For cr?, we notice that 



N 



n=l 

JV 

= X] (^ni{Xn - lA + lA - fJ-if 
n=l 

N N 

= Y "-nii^n - Af + X] "™(Mi - l^if 
n=l 

N 

-I- 2 ^ a„i(a;„ - - in) 

Tl=l 

N 

+ 2(^1 - Mi) Y "-nii^n - Mi ) 



n=l 



AT 



n=l 



n=l 



JV 



= X am(2=n - Mi )^ - Pi-^(Mi - Mi) 

n=l 

Therefore, 

2 1 ^nii^n Mi) 

cr,- = - 



Z^n=l '^'i 



X^n=iaTO(^n Mi) 



*\2 



V^iV 



(Mi - Mi)^ 



(33) 
(34) 
(35) 
(36) 
(37) 
(38) 
(39) 

(40) 
(41) 



It follows that a? goes to (cr* ) ^ . | 



Tlieorem 5.3 Let £1, £2, £3 be arbitrary positive real num- 
bers. Let V = max„ Xn — min„ x„. Then, the probability 
that the classification scheme obtained from Algorithm 1 is 
not (£1, £2, ez)-typical is upper bounded as follows. 



P {the classification scheme is not (£1, £2, ez)-typical) 



< 2 J exp (-2£2iV) + 2 J exp ( 



2 J exp 



(42) 

2£|iV 

1/4 

(43) 
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Proof By using the Azuma inequality, we can show that 

N N 



J2 am - < 

n— 1 n— 1 



> eiiV < 2exp{-2eiN) , 



N 
n=l 



N 



N 



> £2^ < 2 exp 



(44) 



y2 



(45) 



N 



n=l 



< 2 exp 



am (Xn - f^lf -Y ^ Ai*)' 



The theorem follows from a union bound. | 

Corollary 5.4 If the sample number N is sufficiently large, 
then the classification scheme obtained from Algorithm 1 is 
(ei, £2, ez)-typical with probability close to one. 

Proof The upper bound in Theorem 5.3 is close to zero for 
sufficiently large N . | 

Corollary 5.5 If the sample number N is sufficiently large, 
then there exists at least one (ei, £2, e^ytypical classifica- 
tion scheme. 



In Fig. 1, we depict the result of the proposed algorithm 
for a one dimensional mixed signal of two classes, with one 
class having mean 128 and variance 16, and the other class 
having mean 16 and variance 16. The false classification 
ratios are all 0%. In Fig. 2, we depict the result of the 
proposed algorithm for a one dimensional mixed signal of 
two classes, with one class having mean 128 and variance 
2500, and the other class having mean 128 and variance 25. 
The false classification ratios are 16.41% and 6.25%. In 
Fig. 3, we depict the result of the proposed algorithm for a 
one dimensional mixed signal of two classes, with one class 
having mean 50 and variance 2500, and the other class hav- 
ing mean 5 and variance 25. The false classification ratios 
are 10.16% and 3.91%. In each figure, the signal is shown 
in the upper part of the figure. The classification result is 
shown in the lower part of the figure. The grey region of the 
bar indicates the samples which are classified into one class, 
and the white region of the bar indicates the samples which 
are classified into the other class. In all the three cases, the 
signal sample number N — 256. 
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Proof We have presented an algorithm, which constructs 
such a classification scheme with success probability close 
to one. I 

Remark 1 Theorem 5.3 and Corollary 5.5 imply that the 
gap between the optimal classification gain achieved in the 
relaxation optimization and the optimal classification gain 
achieved in the integer optimization goes to zero asymptot- 
ically. In other words, the continuous relaxation incurs an 
asymptotically vanishing optimality loss. 

6 Numerical Results 

In this section, we present numerical results for the pro- 
posed blind classification algorithm. The classification er- 
ror probabilities are measured by false classification ratios, 
Tc = rrii/rii, where rii denotes the number of samples be- 
long to the class i, and rrii denotes the number of samples 
belong to the class i and are classified to classes other than 
the class i. The IPOPT package is used to solve the opti- 
mization programming [13]. 



Figure 1. One dimensional, two classes. 

In Fig. 4, we depict the result of the proposed algorithm 
for a two dimensional mixed signal of two classes, with one 
class having mean 200 and variance 400, and the other class 
having mean 5 and variance 400. The signal is shown in the 
left part of the figure. The classification result is shown in 
the right part of the figure. The false classification ratios are 
1.93% and 0.52%. The size of the image is 32 by 32 pixels. 

In summary, we find that the proposed classification al- 
gorithm is effective and robust. The algorithm has low com- 
putational complexity. 

7 Conclusion 

This paper proposes a blind classification algorithm for 
non- stationary signals, which can be modeled as mixtures 
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Figure 2. One dimensional, two classes. 



Figure 4. Two dimensional, two classes. 




50 100 150 200 250 300 



Figure 3. One dimensional, two classes. 

of signals from several information sources. The proposed 
algorithm is based on data compression principles and re- 
laxed continuous optimizations. We present theoretical dis- 
cussions, which show that our algorithm is asymptotically 
optimal. Numerical results show that the proposed algo- 
rithm is effective, robust and has low computational com- 
plexity. The proposed algorithm can be used to solve var- 
ious multimedia signal segmentation, analysis, and pattern 
recognition problems. 
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